-
Notifications
You must be signed in to change notification settings - Fork 8.2k
subsys: samples: doc: arch: Make perf subsystem #72890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
subsys: samples: doc: arch: Make perf subsystem #72890
Conversation
|
Hello @KushnerovMikhail, and thank you very much for your first pull request to the Zephyr project! |
cfriedt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✨.Beautiful ✨.
doc/services/debugging/perf.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ./FlameGraph/flamegraph.pl perf_buf.foolded > graph.svg | |
| ./FlameGraph/flamegraph.pl perf_buf.folded > graph.svg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. Corrected it.
doc/services/debugging/perf.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| python zephyr/scripts/perf/stackcollapse.py perf_buf <build_dir>/zephyr/zephyr.elf > perf_buf.foolded | |
| python zephyr/scripts/perf/stackcollapse.py perf_buf <build_dir>/zephyr/zephyr.elf > perf_buf.folded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. Corrected it.
nashif
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please split the commit into multiple commits, i.e. 1 commit per arch, docs, adding the subsystem code and finally the sample.
Also, perf itself does not qualify as standalone subsystem. I suggest putting this under subsys/profiling/
830c289 to
20d23f2
Compare
20d23f2 to
4fe8184
Compare
arch/riscv/core/perf.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was recently deprecated, can you please update? @ycsin fyi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking about using API provided in #73587, but there are two issues, I have with it:
- Riscv implementation do not check case when interruption occurred in function where $ra is not saved in stack frame (it can be if function does not call any other function). Seems like it leads to undefined behaviour (in this case
frame->ra=$fp,frame->fp=<undefined>). - In riscv context saving code is same for exceptions and interruptions so during interruption registers are saved according to
struct arch_esf. But for x86 there is separation between context saving during interruption and exception, and registers saved during interruption does not matches withstruct arch_esf. And perf unwind stack during interruption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed deprecated code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right, similar handling exists in Linux's implementation
4fe8184 to
eadbfb2
Compare
I think from the project perspective it makes sense for applications/kernel to use the same stack-unwinding function so that there's less code duplication and easier to maintain in the long run, which is what the I can probably fix the issue that you mentioned in RISC-V (edit: #76045 (edit: not correct yet (edit: should work now))). However, I do not know how stack unwinding works in x86/arm64, but I can massage their existing implementations to use the |
ycsin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add a new entry to https://github.com/zephyrproject-rtos/zephyr/blob/main/MAINTAINERS.yml
arch/x86/core/ia32/intstub.S
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to this file could probably be in another commit / PR to document this change better
|
Now we're cooking |
|
looks good, will take another look so we can get it ready when main branch opens |
Add profiling subsystem. Add perf util based on periodic stack unwinding. Perf from Linux was taken as a reference. The operation of module is based on frame pointer usage and saving registers during interruption handling. The unwinding function stay in timer as expiry functioin so is called during interruption handling. Thus the function have access to saved registers (program counter and frame pointer in particular) of the current thread and use it to unwind the thread stack. Timer starting and results printing function are made as shell commands for conveniency. Originally-by: Yonatan Goldschmidt <[email protected]> Signed-off-by: Mikhail Kushnerov <[email protected]>
Implement stack trace function for riscv arch, that get required thread register values and unwind stack with it. Signed-off-by: Mikhail Kushnerov <[email protected]>
Implement stack trace function for x86_64 arch, that get required thread register values and unwind stack with it. Originally-by: Yonatan Goldschmidt <[email protected]> Signed-off-by: Mikhail Kushnerov <[email protected]>
Implement stack trace function for x86_32 arch, that get required thread register values and unwind stack with it. Signed-off-by: Mikhail Kushnerov <[email protected]>
Samples, that were obtained by profiling perf tool, can be be translated into flamegraph using stackcollapse.py script. Originally-by: Yonatan Goldschmidt <[email protected]> Signed-off-by: Mikhail Kushnerov <[email protected]>
Add doc for profiling perf tool Signed-off-by: Mikhail Kushnerov <[email protected]>
The operation of perf tool can be checked using this sample. Signed-off-by: Mikhail Kushnerov <[email protected]>
eadbfb2 to
bc9b550
Compare
|
Hi @KushnerovMikhail! To celebrate this milestone and showcase your contribution, we'd love to award you the Zephyr Technical Contributor badge. If you're interested, please claim your badge by filling out this form: Claim Your Zephyr Badge. Thank you for your valuable input, and we look forward to seeing more of your contributions in the future! 🪁 |
|
this new subsystem likely needs to be covered in the MAINTAINERS file, as it is orphan right now. cc @nashif |
Add profiling util based on periodic stack unwinding. Perf from Linux was taken as a reference.
Code is based on Sampling profiler (#29304) by @Jongy.
The operation of module is based on frame pointer usage and saving registers during interruption handling by zephyr.
General work description:
The unwinding function stay in timer as expiry function so is called during interruption handling. Thus the function have access to saved registers (program counter and frame pointer in particular) of the current thread and use it to unwind the thread stack.
Timer starting and results printing function are made as shell commands for conveniency.
Obtained samples can be translated into flamegraph using stackcollapse.py script.
Flame graph example, generated from
echo_serversample:Kconfig optioins:
CONFIG_PERFenables perf subsystem in compilation.CONFIG_PERF_BUFFER_SIZEspecifies size of buffer, where samples are stored.Kconfig requirements:
CONFIG_THREAD_STACK_INFO=yprovidesstartandsizethread stack values.CONFIG_SMP=nSMP support is not implemented yet.CONFIG_FRAME_POINTER=yprovide frame pointer.CONFIG_SHELL=yneeds because subsystem provides shell commands.CONFIG_RISCV=y||CONFIG_X86=y- subsystem implemented only for x86 and riscv yet.